Using the ABBYY OCR Object

The ABBYY OCR object in Advanced Process Automation enables you to capture text from images using the Optical Recognition Method (OCR). This object was added in APA 7.6. You can capture text from PDFs or images.

This OCR functionality is included in the Connectivity.OCR library.

The functionality was updated in APA 7.7. to add support for:

Reading cells in complex, asymmetrical tables.
Reading text in various languages.
Retrieving a list of screen element rectangles with the specified word.

To test the new functionality, download the sample project file here. For example, this table has a different number of columns per row.

The Get Table function returns a list of rows, the first row has four cells, the second row has eight cells, and so on.

ABBYY OCR Object Functionality

You can review the available functions of the ABBYY OCR object from the Connectivity.OCR library in the Real-Time Designer.

To view the Connectivity.OCR Library functionality:

In Real-Time Designer, open the Project tab.
Under the References section, expand the Library References node, and select Connectivity.OCR.
Open the Functionality tab, and from the Type drop-down list, select ABBYY OCR.
Open the Business Entities tab and expand the OCR type.

The following OCR properties are available:

Property	Description
Current Page	The current page of the document.
Number of Pages	The number of pages in the document.

The following OCR functions are available:

Function	Description
Close	Disconnect the ABBYY OCR object from ABBYY. This is important for licensing purposes.
Determine Brightness	Determines the screen element rectangle brightness.
Get Block Text	Retrieve the text from a specified block in the current page. The first block in the image is numbered 1. All the text in the block is returned as a single text value.
Get Block Text with Rectangles	Retrieve a list of screen element rectangles and text from a specified block in the current page. The first block in the image is numbered 1. The list is returned via an instance of the ABBYY OCR Word business entity and includes the screen element rectangles and text.
Get Current Page Image	Returns an image object of the current page.
Get Table	Gets a list of the rows and cells (with their values) for the specified table on the current page.
Get Table Cells	Gets a list of ABBYY OCR Word.
Get Text	Retrieve the text from the current page. All the text is returned as a single text value.
Get Text with Rectangles	Retrieve a list of the screen element rectangles and text from the current page. The list is returned via an instance of the ABBYY OCR Word business entity and includes the screen element rectangles and text.
Get Word Rectangles	Retrieves a list of screen element rectangles with the specified word.
Load from File	Load the required file.
Load from Image	Loads an AbbyyOCR object with the image specified.
Set Handwriting Mode	Sets the handwriting mode: Simple Text, Underlined Text, Text In Frame, Gray Boxes, Char Box Areas, Simple Comb, Comb in Frame, Partitioned Frame.
Set Languages	Set the languages, for example, Russian, English, Hebrew, German.